Expected Eligibility Traces
نویسندگان
چکیده
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem remains central research in reinforcement learning artificial intelligence. Eligibility traces enable efficient recent sequence experienced by agent, but not counterfactual sequences that could also have led current state. In this work, we introduce expected eligibility traces. Expected allow, with single update, update preceded state, even if they did do so on occasion. We discuss when provide benefits over classic (instantaneous) temporal-difference learning, show some- times substantial improvements can be attained. way smoothly interpolate between instantaneous mechanism similar bootstrapping, ensures resulting algorithm strict generalisation TD(?). Finally, possible extensions connections related ideas, such successor features.
منابع مشابه
Bidding Strategy on Demand Side Using Eligibility Traces Algorithm
Restructuring in the power industry is followed by splitting different parts and creating a competition between purchasing and selling sections. As a consequence, through an active participation in the energy market, the service provider companies and large consumers create a context for overcoming the problems resulted from lack of demand side participation in the market. The most prominent ch...
متن کاملEligibility Traces for Off-Policy Policy Evaluation
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference methods. Here we generalize eligibility traces to off-policy learning, in which one learns about a policy different from the policy that generates the data. Off-policy methods can greatly multiply learning, as many policie...
متن کاملRecursive Least-Squares Learning with Eligibility Traces
In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning w...
متن کاملiLSTD: Eligibility Traces and Convergence Analysis
We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the dataefficient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n), where n is the number of...
متن کاملEvidence for eligibility traces in human learning
Whether we prepare a coffee or navigate to a shop: in many tasks we make multiple decisions before reaching a goal. Learning such state-action sequences from sparse reward raises the problem of creditassignment: which actions out of a long sequence should be reinforced? One solution provided by reinforcement learning (RL) theory is the eligibility trace (ET); a decaying memory of the state-acti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i11.17200